An appraisal is the process by which a lender determines the value of a home. Bias often enters this process when appraisers undervalue homes due to the race of the current homeowner or their neighbors, preventing homebuyers from receiving loans large enough to cover the contract price of a house.
In this analysis, appraisal equity in Illinois is evaluated through the use of two key variables from the Federal Housing Finance Agency (FHFA) Uniform Appraisal Dataset (UAD):
The percentage of appraisal values that are below the contract price in each tract (an undesirable outcome for a consumer).
The count of home purchase appraisals per year (averaged) in each tract.
Demographic data from the US Census Bureau are also included for comparison.
Please note that FHFA data were suppressed for about 23% of census tracts due to privacy concerns, and thus these tracts are not included in this analysis.
The following packages must be loaded in order for this script to run:
library(tidyverse)
library(tidycensus)
library(sf)
library(tmap)
library(leaflet)
library(plotly)
library(GGally)
Data from the UAD are downloaded from the FHFA datasets page. Tract-level data are used to show differences in appraisal values between local areas.
After data are downloaded, the following operations are done to reduce the size and increase the relevance of the dataset:
Note that the following code chunk is not run in this R Markdown
(hence the # before each line) in order to save time, as
the output file only must be created once.
## Load data, filter data, and drop unnecessary variables -- COMMENTED, AS THIS ONLY NEEDS TO BE RUN ONCE
# fhfa_uad_trct = read.csv("FHFA_UAD_tract/FHFA_UAD_tract.csv") %>%
# filter(STATEPOSTAL == "IL" & YEAR >= 2018 & SERIES %in% c("% of Appraisals Below Contract Price", "Count of Appraisals")) %>%
# select(SERIES, GEOID = TRACT, YEAR, VALUE, PURPOSE)
## Separate out the first variable and take the 5-year average
# fhfa_uad_trct_pctbw = filter(fhfa_uad_trct, SERIES == "% of Appraisals Below Contract Price") %>%
# select(-SERIES, -PURPOSE) %>%
# group_by(GEOID) %>%
# summarise(pctbw = mean(VALUE, na.rm = T)) %>%
# mutate_at(vars(pctbw), ~ifelse(is.nan(.), NA, .))
## Separate out the second variable and take the 5-year average
# fhfa_uad_trct_ctapr = filter(fhfa_uad_trct, SERIES == "Count of Appraisals") %>%
# filter(PURPOSE == "Purchase") %>%
# select(-SERIES, -PURPOSE) %>%
# group_by(GEOID) %>%
# summarise(ctapr = mean(VALUE, na.rm = T)) %>%
# mutate_at(vars(ctapr), ~ifelse(is.nan(.), NA, .))
## Merge the three tables back together
# fhfa_uad_trct1 = merge(fhfa_uad_trct_pctbw, fhfa_uad_trct_ctapr, by = "GEOID")
## Export as a .csv
# write.csv(fhfa_uad_trct1, "FHFA_UAD_tract/FHFA_UAD_tract_filtered.csv")
The following code chunk simply opens up the file created above,
removes an unnecessary index variable, and converts the
GEOID variable (census tract code) to string format to
allow for merging with census data later on.
# Load cleaned CSV created above
fhfa_uad_trct = read.csv("FHFA_UAD_tract/FHFA_UAD_tract_filtered.csv") %>%
select(GEOID, pctbw, ctapr)
# Change GEOID to character data type in order to allow for merging later on
class(fhfa_uad_trct$GEOID) = "character"
2020 census data on population and race are downloaded directly from
the census bureau using the tidycensus
R package. Variables fro the 2020 Decennial Census are total
population (P1_001N), total
non-Hispanic/Latino white population (P2_005N),
total Hispanic/Latino population
(P2_002N), and total non-Hispanic/Latino Black
population (P2_006N). One final variable,
median income (B19326_001), is taken from
a 5-year average of the American Community Survey from 2017-2021
(inclusive).
# Load decennial variables
il_trct_dec = get_decennial(
geography = "tract",
variables = c(popul = "P1_001N", white = "P2_005N", hispa = "P2_002N", black = "P2_006N"),
state = "IL",
year = 2020,
geometry = T,
output = "wide"
) %>%
filter(!st_is_empty(geometry)) # REMOVE EMPTY TRACTS!
## Getting data from the 2020 decennial Census
## Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
## Using the PL 94-171 Redistricting Data Summary File
## Note: 2020 decennial Census data use differential privacy, a technique that
## introduces errors into data to preserve respondent confidentiality.
## i Small counts should be interpreted with caution.
## i See https://www.census.gov/library/fact-sheets/2021/protecting-the-confidentiality-of-the-2020-census-redistricting-data.html for additional guidance.
## This message is displayed once per session.
# Load ACS variable
il_trct_acs = get_acs(
geography = "tract",
variables = c(mdinc = "B19326_001"),
state = "IL",
year = 2021,
survey = "acs5",
geometry = T,
output = "wide"
) %>%
filter(!st_is_empty(geometry)) %>% # REMOVE EMPTY TRACTS!
st_drop_geometry() %>% # REMOVE GEOMETRY (ALREADY PROVIDED FROM DECENNIAL VARIABLES)
select(GEOID, mdinc = mdincE) # REMOVE OTHER UNNECESSARY VARIABLES
## Getting data from the 2017-2021 5-year ACS
## Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
First, the census tables and appraisal data are merged.
# Merge tables
il_trct = full_join(il_trct_dec, il_trct_acs, by = "GEOID") %>%
full_join(fhfa_uad_trct, by = "GEOID")
Next, each race variable is converted to a percent of the total tract population, then multiplied by 100 (this is needed in order to visualize the values as a percentage later on)
# Calculate percentages, then multiply these values and pctbw by 100
il_trct = il_trct %>% mutate_at(vars(c(white, hispa, black)), ~ (./popul)) %>%
mutate_at(vars(c(white, hispa, black, pctbw)), ~ (.*100))
To clean up, non-integer values are rounded and variables are relocated in a more sensible order.
# Round non-integer values
il_trct$white = round(il_trct$white, digits = 1)
il_trct$hispa = round(il_trct$hispa, digits = 1)
il_trct$black = round(il_trct$black, digits = 1)
il_trct$pctbw = round(il_trct$pctbw, digits = 1)
il_trct$ctapr = round(il_trct$ctapr, digits = 1)
# Rearrange variables
il_trct = il_trct %>%
relocate(mdinc, .after = black) %>%
relocate(pctbw, .after = mdinc) %>%
relocate(ctapr, .after = pctbw)
Finally, alternate versions of this table are created, the first with
no NA values, and the second with no NA or
0 values that is converted to logarithmic format. The first
will allow for better map visualizations, while the second will allow
for linear regressions with logarithmic transformation.
# No NA values
il_trct_noNA = na.omit(il_trct)
# No NA or 0 values - logarithmic transformation
il_trct_log = filter(il_trct_noNA, popul > 0 & white > 0 & hispa > 0 & black > 0 & pctbw > 0) %>%
mutate_at(vars(white, hispa, black, mdinc, pctbw, ctapr), ~ log(.))
In order to create interactive maps, package tmap is set to interactive viewing mode and labels with percentage and dollar symbols are created.
# Set tmap to interactive viewing
tmap_mode("view")
## tmap mode set to interactive viewing
# Create "%" and "$" string variables for labeling
il_trct_noNA = il_trct_noNA %>%
mutate(white_pct = paste0(sprintf("%.1f", white), "%")) %>%
mutate(hispa_pct = paste0(sprintf("%.1f", hispa), "%")) %>%
mutate(black_pct = paste0(sprintf("%.1f", black), "%")) %>%
mutate(mdinc_usd = paste0("$", as.character(format(mdinc, big.mark = ",", trim = T)))) %>%
mutate(pctbw_pct = paste0(sprintf("%.1f", pctbw), "%"))
Next, the maps are defined as an object, using data from the “no NA” version of the data table. This object is then converted to a leaflet map in order to change a few settings.
# Create popup label object
popup = c("2020 Population" = "popul", "2020 Percent Non-Hispanic/Latino White" = "white_pct", "2020 Percent Non-Hispanic/Latino Black" = "black_pct", "2020 Percent Hispanic/Latino" = "hispa_pct", "2017-2021 Median Income" = "mdinc_usd", "2018-2022 Percent of Appraisals Below Contract Price" = "pctbw_pct", "2018-2022 Annual Count of Appraisals" = "ctapr")
# Create map object
map = tm_shape(il_trct_noNA, name = "2020 Population by<br>Census Tract") +
tm_polygons(title = "2020 Population by<br>Census Tract", col = "popul", style = "jenks", palette = "Greys", alpha = 0.5, border.alpha = 0.1, id = "NAME", popup.format = c(html.escape = F), popup.vars = popup) +
tm_shape(il_trct_noNA, name = "2020 Percent Non-<br>Hispanic/Latino White") +
tm_polygons(title = "2020 Percent Non-<br>Hispanic/Latino White", col = "white", style = "jenks", palette = "Reds", alpha = 0.5, border.alpha = 0.1, id = "NAME", popup.format = c(html.escape = F), popup.vars = popup, legend.format=list(fun=function(x) paste0(x, "%"))) +
tm_shape(il_trct_noNA, name = "2020 Percent Non-<br>Hispanic/Latino Black") +
tm_polygons(title = "2020 Percent Non-<br>Hispanic/Latino Black", col = "black", style = "jenks", palette = "Oranges", alpha = 0.5, border.alpha = 0.1, id = "NAME", popup.format = c(html.escape = F), popup.vars = popup, legend.format=list(fun=function(x) paste0(x, "%"))) +
tm_shape(il_trct_noNA, name = "2020 Percent<br>Hispanic/Latino") +
tm_polygons(title = "2020 Percent<br>Hispanic/Latino", col = "hispa", style = "jenks", palette = "YlOrBr", alpha = 0.5, border.alpha = 0.1, id = "NAME", popup.format = c(html.escape = F), popup.vars = popup, legend.format=list(fun=function(x) paste0(x, "%"))) +
tm_shape(il_trct_noNA, name = "2017-2021 Median<br>Income") +
tm_polygons(title = "2017-2021 Median<br>Income", col = "mdinc", style = "jenks", palette = "Greens", alpha = 0.5, border.alpha = 0.1, id = "NAME", popup.format = c(html.escape = F), popup.vars = popup, legend.format=list(fun=function(x) paste0("$", x))) +
tm_shape(il_trct_noNA, name = "2018-2022 Percent of<br>Appraisals Below<br>Contract Price") +
tm_polygons(title = "2018-2022 Percent of<br>Appraisals Below<br>Contract Price", col = "pctbw", style = "jenks", palette = "Blues", alpha = 0.5, border.alpha = 0.1, id = "NAME", popup.format = c(html.escape = F), popup.vars = popup, legend.format=list(fun=function(x) paste0(x, "%"))) +
tm_shape(il_trct_noNA, name = "2018-2022 Annual<br>Count of Appraisals") +
tm_polygons(title = "2018-2022 Annual<br>Count of Appraisals", col = "ctapr", style = "jenks", palette = "Purples", alpha = 0.5, border.alpha = 0.1, id = "NAME", popup.format = c(html.escape = F), popup.vars = popup)
# Convert to leaflet
map_leaflet = tmap_leaflet(map)
# Make layers overlay and deselect all layers but one by default
map_leaflet1 = map_leaflet %>% addLayersControl(overlayGroups = c("2020 Population by<br>Census Tract", "2020 Percent Non-<br>Hispanic/Latino White", "2020 Percent Non-<br>Hispanic/Latino Black", "2020 Percent<br>Hispanic/Latino", "2017-2021 Median<br>Income", "2018-2022 Percent of<br>Appraisals Below<br>Contract Price", "2018-2022 Annual<br>Count of Appraisals")) %>%
hideGroup(c("2020 Population by<br>Census Tract", "2020 Percent Non-<br>Hispanic/Latino White", "2020 Percent Non-<br>Hispanic/Latino Black", "2020 Percent<br>Hispanic/Latino", "2017-2021 Median<br>Income", "2018-2022 Annual<br>Count of Appraisals"))
Use the map below to explore how demographic factors such as race and income are associated with appraisal values.
Hover over the layers menu on the right side of the map and use the check boxes within to select one or more data layers to view. You can also click on an individual tract to get more information about it.